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(54) Method and apparatus for voice interaction over a network using parameterized interaction 
definitions 



(57) An audio browsing adjunct (150) executes a 
voice markup language browser. The audio browsing 
adjunct (150) receives a voice interactive request. 
Based on the request, the network node obtains a doc- 
ument. The document includes a voice markup, and a 
parameterized interaction definition or at least one link 
to a parameterized interaction definition when user 
interaction is required. The audio browsing adjunct 
(150) interprets the document in accordance with the 
parameterized interaction definition. By using the 
parameterized interaction definition, entered data is typ- 
ically verified at the audio browsing adjunct (150) 
instead of at a network server. Further, the parameter- 
ized interaction definition can define a finite state 
machine. When it does, the parameterized interaction 
definition can be analyzed so that performance prob- 
lems of the audio browsing adjunct (1 50) are minimized. 
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Description 

FIELD OF THE INVENTION 

The present invention is directed to voice interac- 5 
tion over a network. More particularly, the present inven- 
tion is directed to voice interaction over a network 
utilizing parameterized interaction definitions. 

BACKGROUND OF THE INVENTION w 

The amount of information available over communi- 
cation networks is large and growing at a fast rate. The 
most popular of such networks is the Internet, which is 
a network of linked computers around the world. Much is 
of the popularity of the Internet may be attributed to the 
World Wide Web (WWW) portion of the Internet. The 
WWW is a portion of the Internet in which information is 
typically passed between server computers and client 
computers using the Hypertext Transfer Protocol so 
(HTTP). A server stores information and serves (i.e. 
sends) the information to a client in response to a 
request from the client. The clients execute computer 
software programs, often called browsers, which aid in 
the requesting and displaying of information. Examples 25 
of WWW browsers are Netscape Navigator, available 
from Netscape Communications. Inc.. and the Internet 
Explorer, available from Microsoft Corp. 

Servers, and the information stored therein, are 
identified through Uniform Resource Locators (URL). 30 
URL's are described in detail in Berners-Lee, T, et al., 
Uniform Resource Locators, RFC 1738, Network Work- 
ing Group, 1994, which is incorporated herein by refer- 
ence. For example, the URL 
http://www.hostname.com/document1.html identifies 35 
the document "document1.html" at host server 
"www.hostname.com". Thus, a request for information 
from a host server by a client generally includes a URL. 
The information passed from a server to a client is gen- 
erally called a document. Such documents are gener- 40 
ally defined in terms of a document language, such as 
Hypertext Markup Language (HTML). Upon request 
from a client, a server sends an HTML document to the 
client. HTML documents contain information that is 
interpreted by the browser so that a representation can 45 
be shown to a user at a computer display screen. An 
HTML document may contain information such as text, 
logical structure commands, hypertext links, and user 
input commands. If the user selects (for example by a 
mouse click) a hypertext link from the display, the so 
browser will request another document from a server. 

Currently, most WWW browsers are based upon 
textual and graphical user interfaces. Thus, documents 
are presented as images on a computer screen. Such 
; images include, for example, text, graphics, hypertext 55 
links, and user input dialog boxes. Most user interaction 
with the WWW is through a graphical user interface. 
Although audio data is capable of being received and 



played back at a user computer (e.g. a .wav or au file), 
such receipt of audio data is secondary to the graphical 
interface of the WWW. Thus, with most WWW browsers, 
audio data may be sent as a result of a user request, but 
there is no means for a user to interact with the WWW 
using an audio interface. 

An audio browsing system is disclosed in U.S. Pat- 
ent Application No. 08/635,601 , assigned to AT&T Corp. 
and entitled Method and Apparatus for Information 
Retrieval Using Audio interface, filed on April 22,1996, 
incorporated herein by reference (hereinafter referred to 
as the "AT&T audio browser patent"). The disclosed 
audio browsing system allows a user to access docu- 
ments on a server computer connected to the Internet 
using an audio interface device. 

In one embodiment disclosed in the AT&T audio 
browser patent, an audio interface device accesses a 
centralized audio browser that is executed on an audio 
browsing adjunct. The audio browser receives docu- 
ments from server computers that can be coupled to the 
Internet. The documents may include specialized 
instructions that enable them to be used with the audio 
interface device. The specialized instructions typically 
are similar to HTML. The specialized instructions may 
cause the browser to generate audio output from written 
text, or accept an input from the user through DTMF 
tones or automated speech recognition. 

A problem that arises with an audio browsing sys- 
tem that includes a centralized browser is that the input 
of user data often requires a complex sequence of 
events involving the user and the browser. These events 
include, for example: a) prompting the user for input; b) 
enumerating the input choices; c) prompting the user for 
additional input; and d) informing the user that a previ- 
ous input was wrong or inconsistent. We have found 
that it is desirable to program and customize the central- 
ized browser in order to define the allowed sequences 
of events that can occur when the user interacts with the 
browser. However, when programming and customizing 
the browser, it is important to minimize certain perform- 
ance problems that result from both inadvertently erro- 
neous and malicious programming. 

One such problem is that a browser that has been 
customized can become unresponsive if the customiza- 
tion contains, for example, an infinite loop. In addition to 
reducing the performance of the browser, to the detri- 
ment of other activity being performed by the browser, 
such a loop could allow a telephone call to extend over 
more time, disadvantageous! y adding to the cost of the 
call while at the same time potentially denying other 
callers access to the browser. 

Another problem, known as a "denial of service" 
attack, is easier for the attacker to execute if the browser 
is customized in a way that allows a caller to keep the 
call connected without offering any input. 

Some of these performance problems are less 
important in the context of non -centralized browsers, 
because non-centralized browsers that have been 
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poorly customized typically affect only the computer that 
is executing the browser and the computer's telephone 
lines, and therefore programming errors are effectively 
quarantined. 

However, in the centralized browser embodiment of s 
the audio browsing system disclosed in the AT&T audio 
browser patent, and in any centralized browser, when 
the audio browsing adjunct that is executing the central- 
ized browser incurs performance problems, the nega- 
tive effects of the problems are exacerbated. In an audio io 
browsing system, multiple users access the same audio 
browsing adjunct through multiple audio interface 
devices and thus many users are negatively affected 
when the aucfio browsing adjunct incurs performance 
problems. Therefore, it is desirable in an audio browsing is 
system to minimize performance problems. 

Another problem with most known browsers is that 
data entered on the browser at the client computer is 
typically sent to the server where verification and valida- 
tion of the data is performed. For example, if a user 20 
enters data through a keyboard into a computerized fill- 
in form on a browser, that data is typically sent to the 
Internet server where it is verified that the form was 
properly filled out (i.e.. all required information has been 
entered, the required number of digits have been 25 
entered, etc.). If the form was not properly filled out. the 
server typically sends an error message to the client, 
and the user will attempt to correct the errors. 

However, in an audio browser system, frequently 
the data entered by the user is in the form of speech. 30 
The speech is converted to voice data or voice files 
using speech recognition. However, using speech rec- 
ognition to obtain voice data is not as accurate as 
obtaining data through entry via a keyboard. Therefore, 
even more verification and validation of data when it is 35 
entered using speech recognition is required. Further, 
voice files converted from speech are typically large rel- 
ative to data entered from a keyboard, and this makes it 
difficult to frequently send voice files from the audio 
browsing adjunct to the Internet server. Therefore, it is 40 
desirable to do as much verification and validation as 
possible of entered data at the browser in an audio 
browser system so that the number of times that the 
voice data is sent to the Internet server is minimized. 

Based on the foregoing, there is a need for a audio 45 
browser system in which performance problems of the 
audio browsing adjunct executing the browser are mini- 
mized, and in which entered data is typically verified 
and validated at the browser instead of at the Internet 
server. so 

SUMMARY QF TH5 INVENTION 

In accordance with one embodiment of the present 
invention, an audio browsing adjunct executes a voice 55 
markup language browser. The audio browsing adjunct 
receives a voice interactive request. Based on the 
request, the network node obtains a document. The 



document includes a voice markup, and, when user 
interaction is required, a parameterized interaction defi- 
nition or at least one link to a parameterized interaction 
definition. The audio browsing adjunct interprets the 
document in accordance with the parameterized inter- 
action definition. 

By using the parameterized interaction definition, 
entered data is typically verified at the audio browsing 
adjunct instead of at a network server. Further, in one 
embodiment the parameterized interaction definition 
defines a finite state machine. In this embodiment, the 
parameterized interaction definition can be analyzed so 
that performance problems of the audio browsing 
adjunct are minimized. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows a diagram of a telecommunications 
system which is suitable to practice one embodiment of 
the present invention. 

Fig. 2 illustrates the general form of a parameter- 
ized interaction definition. 

Figs. 3A, 3B and 3C are an example of a parame- 
terized interaction definition. 

DETAILED DESCRIPTION 

Fig. 1 shows a diagram of a telecommunications 
system which is suitable to practice one embodiment of 
the present invention. An audio interface device, such 
as telephone 110, is connected to a local exchange car- 
rier (LEC) 120. Audio interface devices other than a tel- 
ephone may also be used. For example, the audio 
interface device could be a multimedia computer having 
telephony capabilities. In one embodiment, a user of tel- 
ephone 110 requests information by placing a tele- 
phone call to a telephone number associated with 
information provided by a document server, such as 
document server 160. A user can also request informa- 
tion using any device functioning as an audio interface 
device, such as a computer. 

In the embodiment shown in Fig. 1 , the document 
server 160 is part of communication network 162. In an 
advantageous embodiment, network 1 62 is the Internet. 
Telephone numbers associated with information acces- 
sible through a document server, such as document 
server 1 60, are set up so that they are routed to special 
telecommunication network nodes, such as audio 
browsing adjunct 150. 

In the embodiment shown in Fig. 1 , audio browsing 
adjunct 150 is a node in telecommunications network 
102 which is a long distance telephone network. Thus, 
the call is routed to the LEC 1 20, which further routes 
the call to a long distance carrier switch 130 via trunk 
125. Long distance network 102 would generally have 
other switches similar to switch 130 for routing calls. 
However, only one switch is shown in Fig. 1 for clarity. It 
is noted that switch 130 in the telecommunications net- 



BNSDOCID: <EP 087B948A2_I_> 



5 



EP0 878 948 A2 



6 



work 102 is an "intelligent" switch, in that it contains (or 
is connected to) a processing unit 131 which may be 
programmed to carry out various functions. Such use of 
processing units in telecommunications network 
switches, and the programming thereof, is well known in 5 
the art 

Upon receipt of the call at switch 130, the call is 
then routed to the audio browsing adjunct 150. Thus, 
there is established an audio channel between tele- 
phone 1 1 0 and audio browsing adjunct 1 50. The routing w 
of calls through a telecommunications network is well 
known in the art and will not be described further herein. 

Upon receipt of the call and the request from tele- 
phone 1 10, the audio browsing adjunct 150 establishes 
a communication channel with the document server 1 60 is 
associated with the called telephone number via link 
164. In a WWW embodiment, link 164 is a socket con- 
nection over TCP/IP, the establishment of which is well 
known in the art. For additional information on TCP/IP, 
see Comer, Douglas, Internetworking with TCP/IP: Prin- so 
cip/es. Protocols, and Architecture, Englewood Cliffs, 
NJ, Prentice Hall, 1988, which is incorporated by refer- 
ence herein. Audio browsing adjunct 150 and the docu- 
ment server 160 communicate with each other using a 
document serving protocol. As used herein, a document 25 
serving protocol is a communication protocol for the 
transfer of information between a client and a server. In 
accordance with such a protocol, a client requests infor- 
mation from a server by sending a request to the server 
and the server responds to the request by sending a 30 
document containing the requested information to the 
server. Thus, a document serving protocol channel is 
established between audio browsing adjunct 150 and 
the document server 160 via link 164. In an advanta- 
geous WWW embodiment, the document serving proto- 35 
col is the Hypertext Transfer Protocol (HTTP). This 
protocol is well known in the art of WWW communica- 
tion and is described in detail in Berners-Lee, T. and 
Connolly, D., Hypertext Transfer Protocol (HTTP) Work- 
ing Draft of the Internet Engineering Task Force, 1993, 40 
which is incorporated herein by reference. 

Thus, the audio browsing adjunct 150 communi- 
cates with the document server 160 using the HTTP 
protocol. Thus, as far as the document server 160 is 
concerned, it behaves as if were communicating with 45 
any conventional WWW client executing a conventional 
graphical browser. Thus, the document server 160 
serves documents to the audio browsing adjunct 150 in 
response to requests it receives over link 164. A docu- 
ment, as used herein, is a collection of information. The so 
document may be a static document in that the docu- 
ment is pre-defined at the server 160 and all requests 
for that document result in the same information being 
served. Alternatively, the document could be a dynamic 
document, whereby the information which is served in 55 
response to a request is dynamically generated at the 
time the request is made. Typically, dynamic documents 
are generated by scripts, which are programs executed 



by the server 160 in response to a request for informa- 
tion. For example, a URL may be associated with a 
script. When the server 1 60 receives a request includ- 
ing that URL, the server 160 will execute the script to 
generate a dynamic document, and will serve the 
dynamically generated document to the client which 
requested the information. Dynamic scripts are typically 
executed using the Common Gateway Interface (CGI). 
The use of scripts to dynamically generate documents 
is well known in the art. 

As will further be described below, in accordance 
with the present invention, the documents served by the 
server 160 include voice markups which are instructions 
that are interpreted by the audio browsing adjunct 150. 
In order to facilitate interaction between the user of the 
telephone 1 10 and audio browsing adjunct 150, in one 
embodiment the voice markups include links to param- 
eterized interaction definitions. Details of parameterized 
interaction definitions will be described below. When the 
links are interpreted by the audio browsing adjunct 150, 
the appropriate parameterized interaction definitions 
are invoked. In another embodiment, the parameterized 
interaction definitions are included within the document. 

In one embodiment, the voice markups and the 
parameterized interaction definitions are written in a 
language based on HTML but specially tailored for 
audio browsing adjunct 150. One example of HTML-like 
voice markup instructions is "audio-HTML", described in 
the AT&T audio browser patent. 

When an HTML document is received by a client 
executing a conventional WWW browser, the browser 
interprets the HTML document into an image and dis- 
plays the image upon a computer display screen. How- 
ever, in the audio browsing system shown in Fig. 1, 
upon receipt of a document from document server 160, 
the audio browsing adjunct 150 converts some of the 
voice markup instructions located in the document into 
audio data in a known manner, such as using text to 
speech. Further details of such conversion are 
described in the AT&T audio browser patent. The audio 
data is then sent to telephone 110 via switch 130 and 
LEC 120. Thus, in this manner, the user of telephone 
110 can access information from document server 160 
via an audio interface. 

In addition, the user can send audio user input from 
the telephone 110 back to the audio browsing adjunct 
150. This audio user input may be, for example, speech 
signals or DTMF tones. The audio browsing adjunct 150 
converts the audio user input into user data or instruc- 
tions which are appropriate for transmitting to the docu- 
ment server 160 via link 164 in accordance with the 
HTTP protocol in a known manner. Further details of 
such conversion are described in the AT&T audio 
browser patent. The user data or instructions are then 
sent to the document server 160 via the document serv- 
ing protocol channel. Thus, user interaction with the 
document server is via an audio user interface. 

Parameterized interaction definitions are pre- 
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defined routines that specify how input is collected from 
the user via the audio interface device 110 through 
prompts, feedbacks, and timeouts. The parameterized 
interaction definitions are invoked by specific voice 
markup instructions in documents when the documents 5 
are interpreted by the audio browser (referred to as the 
"voice markup language" (VML) browser) executing on 
the audio browsing adjunct 150. In one embodiment, 
the instructions define links to parameterized interaction 
definitions. The parameterized interaction definitions 10 
can be located within the document or elsewhere within 
the audio browsing system shown in Fig. 1 (e.g., at doc- 
ument server 160, at audio browsing adjunct 150, or at 
any other storage device coupled to audio browsing 
adjunct 150). In one embodiment, parameterized inter- 1S 
action definitions are stored on a database coupled to 
an interaction definition server. The interaction definition 
server is coupled to the VML browser so that the param- 
eterized interaction definitions are available to the VML 
browser when requested. In addition, the parameterized 20 
interaction definitions may be part of the voice markup 
instructions, in which case a link is not required. 

For example, a parameterized interaction definition 
may exist that enables a user to make one choice out of 
a list of menu options. This parameterized interaction 25 
definition might be entitled "MEN U_l NTE R ACT" If a 
document includes a section where such an interaction 
is required, a voice markup instruction can be written 
that invokes this interaction such as "Call 
MENUJNTERACT, parameter 1, parameter 2". This 30 
voice markup, when it is interpreted by the VML 
browser, would invoke the parameterized interaction 
definition entitled "MENUJNTERACT", and pass to it 
parameters 1 & 2. 

The parameterized interaction definitions are what 35 
enable the present invention to achieve the previously 
described benefits (i.e., minimize performance prob- 
lems of the audio browsing adjunct, and verify and vali- 
date entered data at the audio browsing adjunct instead 
of at the Internet server). The parameterized interaction 40 
definitions tailor and modify the behavior of the central- 
ized audio browser to achieve these benefits. 

Specifically, in one embodiment, the parameterized 
interaction definitions define finite state machines. It is 
well known that finite state machines can be completely 45 
analyzed before being executed using known tech- 
niques. The analysis can determine, for example, 
whether the parameterized interaction definition will ter- 
minate if the user does not hang up and does not offer 
any input. This prevents a user from tying up the VML so 
browser indefinitely by doing nothing. Further, the anal- 
ysis can determine if all sections or states of the param- 
eterized interaction definition can be reached by the 
user. Further, the analysis can determine if the parame- 
terized interaction definition includes sections or states ss 
that do not lead to an exit point, which would cause an 
infinite loop These states can be revised or eliminated 
before the parameterized interaction definition is inter- 
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preted or executed by the VML browser or the audio 
browsing adjunct 150. Because of the availability of 
these analysis tools, a developer of an audio browser 
document that uses parameterized interaction defini- 
tions can be assured that disruptions to the browser will 
be minimized by implementing the analyzed interaction 
definitions when the document requires user interac- 
tion. 

Further, the parameterized interaction definitions 
provide verification of the user's input. Therefore, 
because the parameterized interaction definitions are 
interpreted at the audio browsing adjunct 1 50. there is a 
minimal need for user input to be sent to the Internet 
server for verification. This saves time and telecommu- 
nication costs because user input frequently consists of 
relatively large voice files. 

Examples of some of the possible types of parame- 
terized interaction definitions include: 

a) menu, where the user is to make one choice out 
of a list of menu options; 

b) multimenu. where the user selects a subset of 
options; 

c) text, where the user must provide a string of 
characters; 

d) digits, where the user most provide sequence of 
digits, whose length is not determined a priori; 

e) digitslimited. where the user must input a prede- 
termined number of digits; and 

0 recording, where the user's voice is recorded to 
an audio file. 

Fig. 2 illustrates the general form of a parameter- 
ized interaction definition. 

Line 200 defines an interaction named 
"interaction_name" for interaction type 
u lnteraction_type." In addition tine 200 declares all 
media that may be used in the interaction. The media 
declared in line 200 includes automatic speech recogni- 
tion (ASR), touch tones or DTMF (TT), and recording 
(REC). 

Line 202 defines a number of attribute parameters. 
Attribute parameters are used to parameterize the inter- 
action and are included in the voice markup instruction 
that invoke the interaction. If no parameters are 
included in the voice markup instructions, a default 
value, "defauit_value" is used as the parameter. 

Line 204 defines a number of message 
parameters. Message parameters can be used as for- 
mal placeholders within the state machine to accommo- 
date prompts and messages specified when using the 
interaction. Message parameters are also used to 
parameterize the interaction and are included in the 
voice markup instruction that invoke the interaction. 

Line 206 defines a number of counter variable dec- 
larations. Each counter is declared with an initial value. 
Operations allow this variable to be decremented from a 
fixed initial value (typically less than 10) and tested for 0. 
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Line 208 defines a number of Boolean variable dec- 
larations. Each Boolean variable is declared with an ini- 
tial value. 

Line 210 defines a number of state declarations. 
Each state contains one of the following constructs: s 

1) An action, which consists of a message synthe- 
sized into speech and code to change the state, 
either immediately or as a result of events enabled. 
Also specified are the input modes that are acti- io 
vated. For example, the input mode ttmenu, which 

is defined for interactions of type menu, specifies 
that events designating the choice of an option can 
occur as a result of the user entering a digit Each 
event is mentioned in an event transition, which is 
specifies the side-effects to be effectuated when 
the event occurs: or 

2) A conditional expression, which allows the action 
to depend on the settings of variables. Thus a con- 
ditional expression consists of actions that are 20 
embedded in if-then -else constructs. 

An interaction defined in the language previously 
described can be regarded as a finite-state machine 
whose total state space is the product of the current 25 
state and the values of the various variables. 

Figs. 3A, 3B and 3C are an example of a parame- 
terized interaction definition. Referring to Fig. 3A, line 
300 defines the interaction type as menu and a param- 
eterized interaction name. Line 302 defines that 30 
attribute parameters. Lines 304 and 306 define counter 
variables. Lines 308, 310, 312 314, 316 and 318 indi- 
cate the beginning of message parameters. 

Referring to Fig. 3B, lines 320, 322 and 324 indi- 
cate the beginning of various states. 35 

Referring to Fig. 3C, lines 326, 328, 330 indicate 
the beginning of various states. Finally, line 332 indi- 
cates the end of the interaction definition. 

More details of the "initial" state that begins on line 
320 of Fig. 3B will be described. The other states shown 40 
in Figs. 3B and 3C function similarly. 

Initially, the state machine associated with the inter- 
action is in state "initial" and the two counter variables 
TTERRCOUNT and TOCOUNT are initialized to MAXT- 
TERROR and MAXTO, respectively. These values, if 45 
not explicitly overridden by parameters when the inter- 
action definition is used, are 3 and 2, respectively. The 
state Initial" specifies that the message PROMPT 
(which is typically a parameter whose actual value is the 
text in the voice markup document preceding the use of so 
the interaction) is to be synthesized while touchtone 
command mode (TT) and touchtone menu selection 
mode (TTMENU) are activated. These activations ena- 
ble the events TTMENU COLLECT and TT 
INPUT=* , HELPTT". respectively, to occur. The first kind 55 
of event denotes a digit input specifying a menu option 
selection. The second kind of event specifically refers to 
the input "HELPTT" (whose default is "##"). If an event 



of the first kind happens, then the next state of the finite- 
state machine will be "echochoice". If the second event 
occurs first, then the next state will be "help". If a mean- 
ingless touchtone occurs, then the event transition 
involving the event TTFAIL specifies that TTERR- 
COUNT is to be decremented and that the next state is 
"notvalid". 

If none of these three events occur within a period 
of time designated by "INACTIVITYTIME", then event 
TIMEOUT happens, TTERRCOUNT is decremented, 
and the next state is "inactivity". 

As described, the VML browser of the present 
invention interprets documents in accordance with 
parameterized interaction definitions. The parameter- 
ized interaction definitions enable an audio browsing 
system to minimize performance problems of the audio 
browsing adjunct and verify entered data at the audio 
browsing adjunct instead of at an Internet server. 

Further, the parameterized interaction definitions 
establish a dialog for the input of data into a field (i.e. the 
"HELPTT" field) where sequences of user input and 
system responses can be specific and controlled. Each 
user generated event such as a key press or a utterance 
by the user is controlled and responded to by the 
parameterized interaction definitions. 

The foregoing Detailed Description is to be under- 
stood as being in every respect illustrative and exem- 
plary, but not restrictive, and the scope of the invention 
disclosed herein is not to be determined from the 
Detailed Description, but rather from the claims as inter- 
preted according to the full breadth permitted by the pat- 
ent laws. It is to be understood that the embodiments 
shown and described herein are only illustrative of the 
principles of the present invention and that various mod- 
ifications may be implemented by those skilled in the art 
without departing from the scope and spirit of the inven- 
tion. For example, the audio browsing system shown in 
Fig. 1 executes the VML browser as a centralized 
browser at audio browsing adjunct 150. However, the 
present invention can also be implemented with other 
embodiments of an audio browsing system, including all 
embodiments disclosed in the AT&T audio browser pat- 
ent. 

Claims 

1. A method of operating an audio browsing 
adjunct, comprising the steps of: 

(a) receiving a request; 

(b) obtaining a document based upon the 
request, wherein the document includes a 
voice markup: and 

(c) interpreting the document in accordance 
with a parameterized interaction definition. 

2. The method of claim 1 , wherein the request is 
received over a public switched telephone network 
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3. The method of claim 1, wherein the request is 
received over a data network. 

4. The method of claim 1 , wherein the document is 
obtained from a server connected to a data net- s 
work. 



located on a server coupled to the network. 

21. The audio browsing system of claim 14, 
wherein the parameterized interaction definition 
defines a finite state machine. 



5. The method of claim 1 , wherein the parameter- 
ized interaction definition is located in the docu- 
ment. 10 



6. The method of claim 1 , wherein the parameter- 
ized interaction definition is located on a server 
coupled to a data network. 

15 

7. The method of claim 6, wherein the document is 
interpreted on a voice markup language (VML) 
browser coupled to the data network and a coupled 
to the data network and a public switched tele- 
phone network based upon the request received by 20 
the VML browser. 



14. An audio browsing system on a network, com- 
prising an audio browsing adjunct coupled to the 
network and executing a voice markup language 25 
(VML) browser, said VML browser adapted to 
receive a request, obtain a VML document through 
the network, and interpret the VML document in 
accordance with a parameterized interaction defini- 
tion. 30 



15. The audio browsing system of claim 14, further 
comprising an interaction definition server coupled 
to said browser, said server adapted to receive a 
request for said parameterized interaction definition 35 
from said browser, and to send said requested 
interaction definition to said browser. 



16. The audio browsing system of claim 15, further 
comprising a database coupled to said server, said 40 
database storing said interaction definitions, said 
server obtaining said interactive definitions from 
said database. 



17. The audio browsing system of claim 14, 45 
wherein the request is received over a public 
switched telephone network. 

18. The audio browsing system of claim 14, 
wherein the request is received over a data net- so 
work. 



19. The audio browsing system of claim 14. 
wherein the parameterized interaction definition is 
located in the VML document. 55 



20. The audio browsing system of claim 14, 
wherein the parameterized interaction definition is 



7 



BNSDOCID: <EP 0878948A2J_> 



EP0 878 948 A2 




8NSD0CI0; <EP 0876948A2_I_> 



8 



4 



EP 0 878 948 A2 



FIG. 2 

200 INTERACTION TYPE=interaction type NAME=interaction name [ASR] 
[TT][REC]> 

202 <ATTRIBUTES parantname Nefaultj/alue) 
paramaffle Nef ault_value] > 

204 <MESSAGE nsg jiame> . . . </Message> 

• • • 

206 <C0UNTER CDunter_name=initial_value> 

• • • 

208 <BOOLEAN boolean_variable=initial_value> 

210 <STATE> 
(action I 

conditional_expr] 

212 </STATE> 

• • • 

214 </INTERACTI0N> 
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FIG. 3 A 



300 <INTERACTI0N T YPE= "menu "NAHE= ' menude f au 1 1 'TT> 
302 ATTRIBUTES 

TIME0UT="3" 

PR0MPTDELAY="1.5" 

HESITATI0N0ELAY="2" 

INACTIVITYTIHE="10" 

RESETTT=V 

HELPTT="#f' 

MAXTTERR0R="3" 

MAXT0=2'> 

304 <C0UNTER TTERRC0UNT= " M AXTTERROR " > 
306 <C0UNTER TOC0UNT="MAXTO"> 

308 <MESSAGE HESITATION> 

<ENUM><SAY VALUE=MENUN0>: <SAY 
VALUE=MENUI TEM></ENUM> 
</MESSAGE> 

310 </MESSAGE HELP> 

There are <SAY VALUE=HENULENGTH>cho ices . 

<ENUM> To Select <SAY VALUE=MENUITEH>, press <SAY 

VALUE=MENUNO>.<ENUM> 

To obtain this help message. press<SAY VALUE=HELPTT>. 
</MESSAGE> 

312 <HESSAGE ECH0CH0ISE> 

<SAY VALUE=MENULASTCHOICE> Press<SAY 
VALUE=RESETT> to cancel. 
</HESSAGE> 

314 <MESSAGE N0TVALIDTTHSG> This key combination doesn't make 
sense. </MESSAGE> 

316 <HESSAGE MAXTERRORMSG> Too many errors. </MESSAGE> 

31B <MESSAGE HAXT0ERR0RHSG> Sorry we didn't recognize any 
input. </HESSAGE> 



1 

EP0878948 A2 



FIG. 3B 



320 <STATE NAME='initiar> 
<M£SSAGE PR0MPT> 
<H0DES> <TT> <TTHENU> 
<EVENTS> 

<TTHENU COLLECT STATE="echochoice'> 

<TT INPUT='HELPTT" STATE="help'> 

<TTFAIL DECREMENT =TTERRCGUNT 
STATE="notvalid'> 

<TIHE0UT TIME="INACTIVITYTIME" DECflEHENT=TOCOUNT 

STATE='inactivity*> 

</EVENTS> 

322 <STATE HAHE= , hesitate"> 

MESSAGE HESITATE DELAY=HESITATIONDELAY> 

<M0DES> TT> TTMENU> 

<EVENTS> 

<TTMENU COLLECT STATE='echochoice'> 
<TT INPUT='HELPTT' STATE='help" 
<TTFAIL OECREMENT=TTERRCOUNT STATE='notvalicT> 
<TIME0UT TIME=*INACTIVITYTIME" STATE='inactivity"> 
</EVENTS> 

324 <STATE NAME=*inactivity'> 
<IF EQ0='TOCOUNT'> 

<HESSAGE MAXT0RR0RHSG> 
<NEW RESET> 
<ELSE> 

<MESSAGE INACIVIVTY> 
<M0OES> TT> ttmenu> 
<EVENTS> 

<TTMENU COLLECT STATE='echochoice'> 

<TT INPUT='HELPTT' STATE=*he]p"> 

<TTFAIL OECREHENT=TTERRCOUNT STATE='notval id*> 

<TIME0UT TIME="INACTIVITYTIHE* 

DECREHENT=TOCOUNT STATE=* inact ivity*> 

<EVENTS> 

</IF> 
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FIG. 3C 



326 <STATE NAHE='echochoice"> 
<HESSAGE ECHOCHOICE> 
<MODES> <TT> 
<EVENTS> 

<TT INPUT="RESETTT RESET STATE='resef> 
<TT INPUT="HELPT7" STATE=help"> 
<TTFAIL DECREMEMT-TTERRCOUNT STATE='notvalid'> 
<TIME0UT TIME="CONFIRMTIHE" DECREHENT=TOCOUNT 
STATE= , inactivity"> 
</EVENTS> 

328 <STATE NAME^notvalid^ 
<IF EQ0='TTERR0RC0UNT*> 

<MESSAGE MAXTERRORHSG> 

<NEH RESET> 
<ELSE> 

<HESSAGE NOT VAL IDT TMSG> 

<NEH STATE='hesitate"> 

</IF> 

330 <STATE NAME="help"> 
<MESSAGE HELPTT> 
<NEH STATE='hesitate"> 

332 </INTERACTION> 
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